DistributedDataMining
当前页面或段落存在质量问题 当前页面或段落不符合页面质量要求,使用了不规范的语言或是提供了错误的信息,可能误导读者,希望了解相关内容的Wiki用户协助改善。
用户 cuihao 给出的意见是:不翻译是不行滴! 。 其他用户若有异议,请前往讨论:DistributedDataMining发表见解。 欢迎无wiki账号的用户到论坛的Wiki系统讨论区(注册)参与讨论。 |
distributedDataMining (dDM) is the name of a research project that uses Internet-connected computers to perform research in the various fields of Data Analysis and Machine Learning. The project uses the Berkeley Open Infrastructure for Network Computing (BOINC) for the distribution of research related tasks to several computers. The intent of BOINC is to enable researchers to tap into the enormous processing power of personal computers around the world. If you are willing to support our research challenges please participate in the dDM-Project: Register and download the BOINC software and Java 1.6. After installing and starting BOINC enter the following project URL: http://ddm.nicoschlitter.de/DistributedDataMining/. Please visit our forum to discuss dDM related issues.
All dDM applications use the open source framework RapidMiner. This data mining suite - developed at University of Dortmund - provides various machine learning methods for data analysis purposes. The RapidMinder provides a comfortable plug-in mechanism to easily add new developed algorithms. This flexibility and the processing power of BOINC is an ideal foundation for scientific distributed Data Mining. The dDM project takes that opportunity and serves as a metaproject for different kind of machine learning applications. Below, you find a list of our subprojects and the related scientific publications.
如何加入项目
该项目基于 BOINC 平台,简要的加入步骤如下(已完成的步骤可直接跳过):
- 下载并安装 BOINC 的客户端软件(官方下载页面或程序下载)
- 点击客户端简易视图下的“Add Project”按钮,或高级视图下菜单中的“工具->加入项目”,将显示向导对话框
- 点击下一步后在项目列表中找到并单击选中 distributedDataMining 项目(如未显示该项目,则在编辑框中输入项目网址:http://www.distributeddatamining.org/ ),然后点击下一步
- 输入您可用的电子邮件地址,并设置您在该项目的登录密码(并非您的电子邮件密码)
- 再次点击下一步,如项目服务器工作正常(并且有适合自身操作系统的计算程序),即已成功加入项目
更详细的加入方法说明,请访问 BOINC 新手指南 或 BOINC 使用教程。
本站推荐您加入 Team China 团队,请访问项目官方网站的 团队检索页面,搜索(Search)并进入 Team China 的团队页面,点击页面中的 Join 并输入用户登录信息即可加入!
Time Series Prediction
Stock Price Prediction (active)
Part of our research is devoted to Time Series Analysis. Our focus is on forecasting economic time series such as DAX and Dow Jones. At first, we focused on the application of artificial neural networks to forecast time series. A detailed description on this approach, the design of the experimental setting as well as the results are presented in [4]. Later on, we applied support vector machines to avoid the high computational complexity of neural networks. The resulting forecasts are equally impressive even though the necessary computational costs can be decreased significantly. In 2008, we published two related studies [5] and [6]. We extended our studies by using various learning algorithms in order to determine there applicability for stock price prediction. After analyzing the obtained results we made two important observations: (i) the influence of the learning algorithm is much lower than expected, but instead (ii) the training window size has a stronger impact on the quality of the prediction. Since, so far, temporal effects are rarely addressed in the literature, we concentrate in our dDM-project on the study of these temporal aspects in time series analysis.
Social Network Analysis
Tanja Falkowski proposed DenGraph - a density-based graph clustering algorithm. This algorithm is deployable for - among other things - Social Network Analysis. The following studies were part of her PhD theses that is published as a book.
Temporal Dynamics of the Last.fm Music Platform (temporarily suspended)
In this application we applied DenGraph-IO to detect and observe changes in the music listening behaviour of Last.fm users during a period of two years. The aim was to see, whether the proposed clustering technique detects meaningful communities and evolutions [1], [2]. read more
Temporal Evolution of Communities in the Enron Email Data Set (finished)
The collapse of Enron, a U.S. company honored in six consecutive years by "Fortune" as "America's Most Innovative Company", caused one of the biggest bankruptcy cases in US-history. To investigate the case, a data set of approximately 1.5 million e-mails sent or received by Enron employees was published by the Federal Energy Regulatory Commission. We've used the processing power of dDM to analyze the temporal evolution of communities extracted from these email correspondences [3]. read more
References
- Schlitter N, Falkowski T. Mining the Dynamics of Music Preferences from a Social Networking Site. In: Proceedings of the 2009 International Conference on Advances in Social Network Analysis and Mining. Athens: IEEE Computer Society; 2009. p. 243-8.
- Falkowski T, Schlitter N. Analyzing the Music Listening Behavior and its Temporal Dynamics Using Data from a Social Networking Site. Zurich; 2008.
- Falkowski T. Community Analysis in Dynamic Social Networks. Goettingen: Sierke Verlag; 2009.
- Schlitter N. Analyse und Prognose ökonomischer Zeitreihen: Neuronale Netze zur Aktienkursprognose. Saarbrücken: VDM Verlag Dr. Müller; 2008.
- Schlitter N. A Case Study of Time Series Forecasting with Backpropagation Networks. In: Steinmüller J, Langner H, Ritter M, Zeidler J, editors. 15 Jahre Künstliche Intelligenz an der TU Chemnitz. Chemnitz: Techn. Univ. Chemnitz, Fak. für Informatik; 2008. p. 203-17. (Chemnitzer Informatik-Berichte).
- Möller M, Schlitter N. Analyse und Prognose ökonomischer Zeitreihen mit Support Vector Machines. In: Steinmüller J, Langner H, Ritter M, Zeidler J, editors. 15 Jahre Künstliche Intelligenz an der Fakultät für Informatik. Chemnitz: Techn. Univ. Chemnitz, Fak. für Informatik; 2008. p. 189-201. (Chemnitzer Informatik-Berichte).