For a comprehensive manual of challenge organization see: Organizer's Guide.
Who can start a challenge and for what purpose?
Student Challenge can be launched by:
- Tutor of laboratory classes - To organize assignment task in a form of a contest, to make it more attractive and valuable for students, and to avoid manual marking of assignments.
- Team of programmers in a company. Team of researchers at a university or laboratory. - To collaborate on a given algorithmic problem in easy way, where everyone can independently investigate different approaches, but share intermediary solutions with peers to evaluate (automatically) and compare each others' ideas along the way.
Scientific Challenge can be launched by:
- Group of researchers - To pose an open scientific problem that can be tackled by everyone. To attract attention of the scientific community, focus work of research teams from different countries and universities, help disseminate best ideas among the community and establish common benchmark for all future research in the field.
- Enterprise - To engage scientific community in investigation of a real-world problem faced by the company. To boost Human Resources activities: find best talents in the field and promote the company among prospective employees. (Note: IP cannot be transferred in result of a Scientific Challenge. See Industrial Challenge below for this purpose.)
Industrial Challenge can be launched by:
- Enterprise that owns large volumes of data and seeks the most efficient way to explore it and extract useful actionable knowledge, to improve marketing activities, sales and other business processes.
- Enterprise that designs advanced software and seeks highly efficient intelligent algorithm to become essential part of the product.
- Enterprise that wants to improve an algorithm already designed by itself, to achieve the highest possible quality and performance. Example: the $1 Million Netflix Prize contest.
See also comparison of challenge types.
How to find easily a dataset for a Student Challenge?
For educational contest, it's easiest to take a publicly available dataset - there's no need to use original unpublished data as in Scientific or Industrial challenges. Nearly 1000 different datasets, from different sources, can be found in TunedIT Repository. For example, look at the list of ARFF data files or UCI data in Repo.
After you pick a dataset, the next step is to introduce distortions. This is necessary if you want to prevent students from "google-ing" your original dataset on the web by finding textual occurrences of a selected value ("832.76") or by tracking a file with exactly the same no. of attributes and samples. Distortions can be very simple, for instance you can:
- Scale linearly attribute values: y = Ax+ B (A,B - any constants, can be different for each attribute).
- Shuffle the order of attributes and samples.
- Remove small number of selected samples/attributes or add artificial ones, to distort overall size of the file.
These operations can be performed easily in a spreadsheet.
Finally, when the dataset is preprocessed, you can upload it to the challenge and split automatically with Data Wizard.