How to penalize for empty fields in a DataFrame?2019 Community Moderator ElectionPandas: access fields within field in a DataFrameHow duplicated items can be deleted from dataframe in pandaslengthy criteria in dataframe selectorResampling pandas Dataframe keeping other columnsHow to group this dataframe in python?Pandas DataFrame Rollup ErrorDataframe size is null?Pivot reshape dataframeHow to get a dataframe values in one single column for the following dataset?Manipulating multi-indices for a pandas dataframe
Why is the sentence "Das ist eine Nase" correct?
Were days ever written as ordinal numbers when writing day-month-year?
What's the meaning of "Sollensaussagen"?
How do I exit BASH while loop using modulus operator?
Finding the reason behind the value of the integral.
Should I tell management that I intend to leave due to bad software development practices?
Do Iron Man suits sport waste management systems?
Rotate ASCII Art by 45 Degrees
What historical events would have to change in order to make 19th century "steampunk" technology possible?
Why was Sir Cadogan fired?
How can a day be of 24 hours?
Machine learning testing data
What exactly is ineptocracy?
Finitely generated matrix groups whose eigenvalues are all algebraic
Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?
Is it possible to map the firing of neurons in the human brain so as to stimulate artificial memories in someone else?
What is the opposite of "eschatology"?
Why were 5.25" floppy drives cheaper than 8"?
Obtaining database information and values in extended properties
Implication of namely
Mathematica command that allows it to read my intentions
What is an equivalently powerful replacement spell for Yuan-Ti's Suggestion spell?
Can compressed videos be decoded back to their uncompresed original format?
What do you call someone who asks many questions?
How to penalize for empty fields in a DataFrame?
2019 Community Moderator ElectionPandas: access fields within field in a DataFrameHow duplicated items can be deleted from dataframe in pandaslengthy criteria in dataframe selectorResampling pandas Dataframe keeping other columnsHow to group this dataframe in python?Pandas DataFrame Rollup ErrorDataframe size is null?Pivot reshape dataframeHow to get a dataframe values in one single column for the following dataset?Manipulating multi-indices for a pandas dataframe
$begingroup$
I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.
So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.
pandas data
New contributor
$endgroup$
add a comment |
$begingroup$
I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.
So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.
pandas data
New contributor
$endgroup$
add a comment |
$begingroup$
I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.
So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.
pandas data
New contributor
$endgroup$
I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.
So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.
pandas data
pandas data
New contributor
New contributor
New contributor
asked 2 days ago
jatrp5jatrp5
111
111
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
This heavily depends on the domain knowledge. A general approach would be to place
A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or
A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or
A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.
No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either
Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or
By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48293%2fhow-to-penalize-for-empty-fields-in-a-dataframe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This heavily depends on the domain knowledge. A general approach would be to place
A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or
A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or
A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.
No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either
Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or
By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.
$endgroup$
add a comment |
$begingroup$
This heavily depends on the domain knowledge. A general approach would be to place
A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or
A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or
A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.
No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either
Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or
By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.
$endgroup$
add a comment |
$begingroup$
This heavily depends on the domain knowledge. A general approach would be to place
A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or
A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or
A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.
No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either
Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or
By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.
$endgroup$
This heavily depends on the domain knowledge. A general approach would be to place
A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or
A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or
A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.
No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either
Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or
By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.
edited 2 days ago
answered 2 days ago
EsmailianEsmailian
2,487318
2,487318
add a comment |
add a comment |
jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.
jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.
jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.
jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48293%2fhow-to-penalize-for-empty-fields-in-a-dataframe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown