Kaggle

Kaggle in a Nutshell

“The problems we currently face can’t be solved at the level of thinking that created them” -Albert Einstein

What is Kaggle

Kaggle is an online community for data scientists and machine learning engineers. It is a subsidiary of Google. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Who Created Kaggle

Kaggle was founded by Anthony Goldbloom a brilliant economist(well technically he’s an econometrics expert they’re the same thing right? :( ). His objective was to bring large and open data to the masses through crowdsourcing. According to Goldbloom, Kaggle has united data scientists and businesses in a meaningful way.

Without the discipline of a wife to come home to you just end up walking all the time” -Anthony Goldbloom

His concept did not initially receive sufficient backing in Australia, so he decided to relocate to Silicon Valley in the United States. In a recent tech conference, Goldbloom expressed his surprise at how much talent is available and was inaccessible to companies before the inception of Kaggle.

A few years ago, Kaggle announced that they had gotten over 1 million users or Kagglers (and yes it does sound like the name of a cereal). The Kaggle community spans over 194 countries (Basically if there’s internet access, data scientists and Machine Learning Engineers, there’s Kaggle). It has over 536,000 active members from 194 countries and it receives close to 150,000 submissions per month. Kaggle was started in Melbourne, Australia Kaggle moved to Silicon Valley in 2011.

The site raised some 11 million dollars from the likes of Hal Varian (Chief Economist at Google), Max Levchin (Paypal), Index Ventures and Khosla Ventures and then was acquired by Google on March of 2017. Kaggle is the number one stop for data science enthusiasts all around the world who compete for prizes and boost their Kaggle rankings, the best are called grandmasters (like in chess). They are currently 219 Kaggle Grandmasters in the world but the list is updated as soon as someone obtains grandmaster status, you can get a real-time list here.

Why the Kaggle Community is Interesting

Kaggle contains interesting and challenging projects where contributors can learn and practice. Kaggle is especially a good place for beginners trying to break into the machine learning and data science fields. Kaggle also serves as a platform for companies who want to hire machine learning engineers.

With Kaggle you get access to top experts in the data science and machine learning field. Apart from projects and competitions Kaggle also allows live discussions between numerous people on the platform and these discussions can be very informative.

Kaggle also gives her users the chance to become part of the largest data community in the world. The platform is trusted by some of the most data-centric companies globally, such as; Walmart. Kaggle provides interesting and challenging projects where contributors can learn and practice.

Kaggle Competions: How they work

The host of the competition is in charge of preparing the data and preparing a detailed description of the problem at hand. To make it more convenient for hosts, Kaggle offers an additional consulting service that can help prepare data and describe the problem in the best possible format. The participants who compete for projects submit their models.

All the work is shared on the platform through detailed Kaggle scripts to inspire new ideas to achieve better benchmarks. In most Kaggle competitions, submissions are scored immediately and clearly summarised publicly on the live leaderboard. Competitors are not given a single chance at solving a problem. Before the deadline expires, the competitors are allowed to make revisions on their submissions as they deem fit. This fuels competitors’ motivations to consistently innovate, be creative and polish their skills to produce better, elegant and effective solutions.

Allowing for revisions elevates the level of accuracy and precision as well. When the deadline for a competition expires, the host pays the prize money to the winner. Hosts have the sole ownership and royalty-free license to use the winning entry any way they want with all intellectual property.

How are the Winners of Competitions picked? The host will screen participants depending on where they are placed on the leaderboard. Their final scripts and also the content of the scripts submitted. Most hosts take the prerogative to reach out to strong contenders and arrange interviews.

Kaggle Rankings: How they work and How the winners are picked

The host of the competition is in charge of preparing the data and preparing a detailed description of the problem at hand. To make it more convenient for hosts, Kaggle offers an additional consulting service that can help prepare data and describe the problem in the best possible format. The participants who compete for projects submit their models. All the work is shared on the platform through detailed Kaggle scripts to inspire new ideas to achieve better benchmarks. In most Kaggle competitions, submissions are scored immediately and clearly summarised publicly on the live leaderboard.

Competitors are not given a single chance at solving a problem. Before the deadline expires, the competitors are allowed to make revisions on their submissions as they deem fit. This fuels competitors’ motivations to consistently innovate, be creative and polish their skills to produce better, elegant and effective solutions. Allowing for revisions elevates the level of accuracy and precision as well. When the deadline for a competition expires, the host pays the prize money to the winner. Hosts have the sole ownership and royalty-free license to use the winning entry any way they want with all intellectual property.

How the winners are picked

The host will screen participants depending on where they are placed on the leaderboard. Their final scripts and also the content of the scripts submitted. Most hosts take the prerogative to reach out to strong contenders and arrange interviews.

Kaggle Performance Rankings: Understanding the Tiers

You can’t talk about Kaggle without talking about her performance tiers (It’s a way of ranking in the community), you start as a Novice and eventually progress to becoming a Contributor, Expert, Master, and finally Grandmaster.

Your highest tier in any of these categories is displayed on your main profile page. For example, you might be an Expert in Competitions, a Contributor in Notebooks, a Novice in Datasets, and a Master in Discussions but your main profile page will display Master which is your highest tier. You advance through the tiers by obtaining medals in a competition result, gaining popularity for a notebook, dataset, comment, etc.

Ranking Tiers

  • Novice: You are automatically a Novice as soon as you join Kaggle. All you have to do is register, this is the most basic tier you can start at and the only way to go is up.

  • Contributor: Next is the Contributor. You advance to this tier when you have explored Kaggle fully and contributed positively to the community. You must complete various steps before becoming a Contributor, including adding personal information to your profile such as your bio, location, occupation, organization, etc. You also need to verify your account using SMS and engage in all the Kaggle categories by running a script, making a comment, taking part in a competition, casting an upvote, etc.

  • Expert: You become an Expert once you have worked enough on Kaggle to have some experience in Competitions, Notebooks, Datasets, and Discussions. To become an Expert in competitions, you need 2 bronze medals, for Datasets you need 3 bronze medals, it is 5 bronze medals for Notebooks, and last but not least 50 bronze medals for Discussions. Now you might be wondering what are these medals and how can you earn them? Well, the Kaggle medals are awarded for excellent and praiseworthy work in all the different categories.

There are varied rules for how you can earn medals; For example, Competition medals are given for top performance in competitions. Dataset Medals are given to popular datasets that receive high upvotes. It is similar for Notebook and Discussion Medals as they are provided based on a high number of upvotes. If you want to know more, you can see the specific rules for getting medals on Kaggle here

  • Master: After becoming an Expert, the next step is Master. You only reach this honour when you demonstrate your mastery over any one of the Competitions, Notebooks, Datasets, or Discussions. Maybe even more than one also, And there are also many perks to reaching this stage.

Now you can participate in exclusive Master-Only competitions that are not available to other people. But how to reach here? To become a Master in Competitions, you need 1 gold medal and 2 silver medals. For Datasets, the requirement is 1 gold medal and 4 silver medals and for Notebooks, you only need 10 silver medals. As for discussions, it’s 50 silver medals and at least 200 medals in total.

  • Grand Master: And now comes the Grandmaster tier. The aim of every Novice on Kaggle! You can’t do any better than Grandmaster on Kaggle so it’s not easy to reach here. You need to demonstrate outstanding and exemplary performance in any one of the Competitions, Notebooks, Datasets, or Discussions to advance here. Reaching Grandmaster means you are the best of the best, and you need to work hard for that.

To become a Grandmaster in Competitions, you need 5 gold medals and a solo gold medal as well. For Datasets, the requirement is 5 gold medals and 5 silver medals and for Notebooks, you only need 15 gold medals. As for discussions, it’s 50 gold medals and an insane amount of 500 medals in total.

Kaggle Learn

Kaggle offers a host of micro-courses for those interested in pursuing a Machine Learning and Data Science career. Kaggle Learn isn’t exactly brand spanking new; it launched in January 2018 and has seen a growth of 1500% since (according to Kaggle Learn). However, its short course approach to practical data skills seems to set it apart from other resources, with courses running in the 3-8 hour range.

Written on February 5, 2023