Table of Contents
How many people participate in a challenge?
Typically, several hundred teams register and download training data. About 100 of them come up with at least one submitted solution. Most of them work on the problem throughout the whole duration of the challenge and submit improved solutions many times.
You may take a look at the Leaderboard of the IEEE ICDM Contest. There were 575 registered teams, of whom 100 submitted solutions, most of them many times: the total number of solutions was close to 5000.
Note that these figures depict only the number of teams, while some teams may comprise more than one member. Total number of participants across all teams can be twice as large as the number of teams itself.
Why so many people participate if only few of them receive awards and the rest get nothing?
The award is not the only motivation to participate. Other reasons are equally important:
- To have FUN… In case you didn't know: data crunching is ADDICTIVE! :-)
- To compare and verify skills.
- To get access to new challenging research problems.
- To gain new experiences.
- To gain publicity, earn strong points to their Curriculum Vitae etc.
It is very important that preliminary Leaderboard is present throughout the entire contest, so participants know roughly their chances for winning, from the very beginning, and can estimate the amount of effort needed to reach good final standing.
Who are the participants? Are they skilled enough to come up with valuable solutions?
You won't find anywhere else so many talented people as at TunedIT. TunedIT is the place where data scientists, practitioners and students work, collaborate, teach, learn and have fun. We closely cooperate with the academia and scientific community:
- We host competitions for top conferences and scientific organizations in data mining and statistics:
- We provide our contest platform for free to all tutors who wish to launch contests for educational purposes.
- We provide advanced cutting-edge tools for running experiments in data mining and execute the highest-quality reproducible research.
- We provide a large repository of data mining resources - datasets, algorithms, experimental results - that can be freely used and employed in research by everyone.
A typical profile of our members is a person with a PhD degree or a PhD candidate in a field of Computer Science related to data mining, machine learning and statistics (or their applications: bioinformatics, signal processing etc.); he's between 25 and 35 years old; works in academia or in a software company or in a data mining consulting company. There are also a significant number of professors and experienced researchers, who lead several-member teams of their PhD students. To learn more, visit the Leaderboards of our scientific contests - some participants provide www links to their home pages, which you can visit. For instance, browse the Leaderboards of IEEE ICDM Contest or RSCTC Discovery Challenge.
TunedIT members are situated in over 80 countries, from various universities and research institutes worldwide, including the most notable and prestigious organizations such as:
- Yale University
- Stanford University
- University of Cambridge
- University of Oxford
- University of Toronto
- University of Tokyo
- Stockholm University
- Max Planck Inst. for Biological Cybernetics
- IBM T.J. Watson Research Center
- AT&T Labs Research
- Microsoft Innovation Lab
as well as many other universities from Asia, South America and Eastern Europe less known by the mainstream.
It's worth noting that very often participants from less known countries and organizations have comparable skills to those from top institutes. What's more, these participants often enjoy greater motivations to win, because participation in TunedIT challenges is the only opportunity for them to present their skills and gain international recognition.
This wide diversity of backgrounds, experiences and ideas guarantees that the contest problem is approached from all unique perspectives and the best possible solution is always tracked down by one of the participating teams.
How long does it take: challenge itself and its configuration?
Typically, if you have a dataset already prepared, configuring the challenge takes no more than a week and essentially boils down to writing descriptions of the task and dataset. If descriptions are ready, configuration of other settings takes 5 minutes.
The challenge itself lasts typically 2-3 months, but 1 month is also enough if you're in a hurry to receive the solution. Only if your problem is atypical and requires substantial knowledge in a given application domain (things that participants will have to learn during the contest) or is technically difficult (for example the algorithm operates on complex data structures), the challenge may need to last longer.
How much does it cost?
Less than you might expect! Typically, the total cost (prizes and hosting fee) of Industrial Challenge is comparable to hiring 1-2 specialists on-site to do the job. For this price you get a group of 100+ specialists from different countries and universities to investigate your problem in parallel, trying all different approaches and algorithms. See Three Challenges for detailed information about challenge types and pricing.
Why to launch a contest if I can hire an employee instead?
By launching a contest you gain access to 100+ specialists that will investigate your problem in parallel. Each specialist has a different background, knowledge base and ideas. Each specialist follows a different path to solve the problem. Collaboratively, they can find the best solution ever possible. There is no better way to track the most efficient and precisely fine-tuned algorithm.
If you were to build a similar team of specialists on-site and carry out the same research in the traditional way, the cost would climb to $1 Million or more. By organizing a competition, you can get Return on Investment (ROI) of even 1:50!
- Prizes encourage people to take risks in order to get results done.
- No need to rent office space, buy hardware and software, pay countless taxes, …
- No need to waste time searching for good specialists to hire on-site: distributing job advertisements, reviewing resumes etc.
- Publicity, PR and HR benefits for the company.
Our datasets are sensitive and we keep them confidential. Do we have to disclose them in order to launch a contest?
All companies are "by default" afraid of disclosing any data, of any kind and in any form - that's normal. However, in 95% of cases these concerns are unjustified and even quick investigation of possible threats shows their irrelevance. Moreover, every threat must be assessed in the context of potential benefits - cost savings, financial gains, higher satisfaction of customers etc. - resulting from top-grade algorithms or predictive models designed by participants. These benefits are very large and greatly outweigh any possible threats.
For the contest, it's necessary to disclose at least a small part of the data, so that participants can investigate them and find hints for the design of effective algorithms. However, note that:
- Contest datasets are never the raw data collected by the company, but a heavily preprocessed version of it. In most cases, all sensitive information can be stripped out. For instance, if the algorithm predicts life-time value of a customer, it doesn't need to know family name or home address of the person in question.
- In case of transactional or time series data, you don't have to disclose latest records. Frequently it's enough to show data from the past, which are already known or have no business value. Only when deploying the winning algorithm, you will provide full most recent dataset and retrain the model.
- The meaning of attributes can in many cases be kept confidential, without impeding the design of algorithms.
- Values of attributes - be it numeric or symbolic - can be distorted in simple ways, like scaling, that don't influence quality of algorithms, but remove sensitive components of the information.
- The part of data that needs to be disclosed can be a small fraction of the whole dataset, so it's usually possible to inspect it manually and check if nothing sensitive is revealed.
All in all, even if your competition could benefit anyhow from contest data, you will benefit far more from the contest outcome. Your company will receive a top-grade algorithm or predictive model, perfectly suited to your data and solving your business problem with unprecedented accuracy. Competitors' gains from knowing the data will be a mere fraction of your gains from having the top-quality algorithm in hand.