Nan for loss in training #4641
Replies: 7 comments 14 replies
-
I didn't quite understand how you set your data up, could you provide an example? |
Beta Was this translation helpful? Give feedback.
-
Could you model your data as a graph, with edge features giving the
distance to the anchor nodes? That way you could quite naturally exclude
the unreachable nodes by excluding those connections from your graph.
If thats not realistic, then you probably need to share the model
architecture.
…On Sat, 14 May 2022, 4:45 pm JiaruiWang, ***@***.***> wrote:
Sure.
Here are 156 features for one target node, These features are distances
from the target node to the other 52 anchor nodes. For each anchor node,
the target node has 3 distances to the anchor node, in direction distance,
out direction distance, and undirected distance (treat all the edges as
undirected). 52 * 3 = 156 distance features. For in direction and out
direction, some anchor nodes are not reachable from the target node. These
unreachable distances are inf. These inf causes loss to be nan.
[4., 4., 3., 5., inf, 4., 4., inf, 3., 5., inf, 4., 4., inf, 3., 5., inf,
3.,
5., inf, 3., 5., inf, 3., 4., 3., 3., 5., inf, 4., 5., 5., 4., 3., 4., 3.,
5., inf, 4., 5., 3., 3., 5., inf, 3., 4., inf, 3., 5., 4., 3., 5., inf, 4.,
5., inf, 3., 4., 4., 3., 5., inf, 3., 5., 4., 3., 5., 3., 3., 4., 4., 3.,
5., 4., 3., 4., inf, 4., 4., 4., 3., 5., 5., 3., 3., 4., 2., 5., 5., 4.,
5., inf, 4., 5., 4., 3., 6., inf, 3., 5., inf, 4., 3., 4., 3., 3., 4., 3.,
4., 4., 3., 3., 2., 2., 5., inf, 4., 4., 3., 3., 4., 4., 3., 5., 3., 3.,
5., 4., 3., 5., 5., 3., 4., 4., 3., 3., 4., 3., 5., inf, 4., 5., 4., 3.,
4., 3., 2., 4., 4., 3., 4., 3., 3., 5., inf, 3.]
If I use only undirected distance as the feature, all the anchor nodes
will be reachable from the target node. There will be no inf in the 52
features. The loss is not nan anymore.
[4., 4., 3., 5., 3., 5., 4., 3., 5., 3., 3., 4., 4., 3.,
5., 4., 3., 4.,4., 4., 4., 3., 5., 5., 3., 3., 4., 2., 5., 5., 4.,
5.,4., 5., 4., 3., 6., 3., 5., 4., 3., 4., 3., 3., 4., 3.,
4., 4., 3., 3., 2., 2,]
I want to keep the in direction distance and out direction distance. Is it
possible?
—
Reply to this email directly, view it on GitHub
<#4641 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGRPN2SV5FS5XDUZ5PCYB3VJ5R2JANCNFSM5V5BRQDQ>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
--
By communicating with Grab Inc and/or its subsidiaries, associate
companies and jointly controlled entities (“Grab Group”), you are deemed to
have consented to the processing of your personal data as set out in the
Privacy Notice which can be viewed at https://grab.com/privacy/
<https://grab.com/privacy/>
This email contains confidential information
and is only for the intended recipient(s). If you are not the intended
recipient(s), please do not disseminate, distribute or copy this email
Please notify Grab Group immediately if you have received this by mistake
and delete this email from your system. Email transmission cannot be
guaranteed to be secure or error-free as any information therein could be
intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain
viruses. Grab Group do not accept liability for any errors or omissions in
the contents of this email arises as a result of email transmission. All
intellectual property rights in this email and attachments therein shall
remain vested in Grab Group, unless otherwise provided by law.
|
Beta Was this translation helpful? Give feedback.
-
Okay I understand the problem you describe now that I see your model.
I guess if you really want to continue with this model, the easiest thing
to do would be to use a very large number rather than inf. You could also
filter out any inf values before you calculate loss.
What I was suggesting is that your anchor nodes can be also considered part
of the graph instead of building features in x, you can use the graph
structure of your problem.
…On Sat, 14 May 2022, 5:35 pm JiaruiWang, ***@***.***> wrote:
By the way, the model won't get better than accuracy for
Train: 0.0244, Val: 0.0532, Test: 0.0519
It's too bad. Where did I do wrong?
—
Reply to this email directly, view it on GitHub
<#4641 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGRPNZCBRJS7CM76QHQC5DVJ5XWRANCNFSM5V5BRQDQ>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
--
By communicating with Grab Inc and/or its subsidiaries, associate
companies and jointly controlled entities (“Grab Group”), you are deemed to
have consented to the processing of your personal data as set out in the
Privacy Notice which can be viewed at https://grab.com/privacy/
<https://grab.com/privacy/>
This email contains confidential information
and is only for the intended recipient(s). If you are not the intended
recipient(s), please do not disseminate, distribute or copy this email
Please notify Grab Group immediately if you have received this by mistake
and delete this email from your system. Email transmission cannot be
guaranteed to be secure or error-free as any information therein could be
intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain
viruses. Grab Group do not accept liability for any errors or omissions in
the contents of this email arises as a result of email transmission. All
intellectual property rights in this email and attachments therein shall
remain vested in Grab Group, unless otherwise provided by law.
|
Beta Was this translation helpful? Give feedback.
-
I don’t understand what you mean by using the graph structure of the
problem. All the anchor nodes are in the graph.
On Sat, May 14, 2022 at 2:46 AM Padarn Wilson ***@***.***>
wrote:
… Okay I understand the problem you describe now that I see your model.
I guess if you really want to continue with this model, the easiest thing
to do would be to use a very large number rather than inf. You could also
filter out any inf values before you calculate loss.
What I was suggesting is that your anchor nodes can be also considered part
of the graph instead of building features in x, you can use the graph
structure of your problem.
On Sat, 14 May 2022, 5:35 pm JiaruiWang, ***@***.***> wrote:
> By the way, the model won't get better than accuracy for
> Train: 0.0244, Val: 0.0532, Test: 0.0519
> It's too bad. Where did I do wrong?
>
> —
> Reply to this email directly, view it on GitHub
> <
#4641 (reply in thread)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAGRPNZCBRJS7CM76QHQC5DVJ5XWRANCNFSM5V5BRQDQ
>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***
> com>
>
--
By communicating with Grab Inc and/or its subsidiaries, associate
companies and jointly controlled entities (“Grab Group”), you are deemed
to
have consented to the processing of your personal data as set out in the
Privacy Notice which can be viewed at https://grab.com/privacy/
<https://grab.com/privacy/>
This email contains confidential information
and is only for the intended recipient(s). If you are not the intended
recipient(s), please do not disseminate, distribute or copy this email
Please notify Grab Group immediately if you have received this by mistake
and delete this email from your system. Email transmission cannot be
guaranteed to be secure or error-free as any information therein could be
intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain
viruses. Grab Group do not accept liability for any errors or omissions in
the contents of this email arises as a result of email transmission. All
intellectual property rights in this email and attachments therein shall
remain vested in Grab Group, unless otherwise provided by law.
—
Reply to this email directly, view it on GitHub
<#4641 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADEMCKW4VEX34KZ77QUGT5LVJ5ZANANCNFSM5V5BRQDQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
Sorry I must misunderstand what you're trying to model here. It would make more sense to me to use edge attributes or weight to include the distance information and use a model that uses these features in the message passing.. but perhaps this doesn't make sense for you. If you want to solve your initial problem, I'd suggest filtering out the inf values from your loss before using backwards .. otherwise your gradients will be huge. |
Beta Was this translation helpful? Give feedback.
-
However, I run into another problem. The label distribution for my data is very imbalanced. There are 52 label classes in the dataset, 6,000,000 nodes. Most of the class counts are less than 1.5%, the largest class is 15% of the total data. If I separate the dataset into 90% train, 5% validation, and 5% test randomly. The model will classify all the nodes into the largest label class. Do you have any suggestions? Is this underfitting? |
Beta Was this translation helpful? Give feedback.
-
I see, then why not use edge attribute features instead of using node
features?
…On Sat, 14 May 2022, 5:52 pm JiaruiWang, ***@***.***> wrote:
I don’t understand what you mean by using the graph structure of the
problem. All the anchor nodes are in the graph.
On Sat, May 14, 2022 at 2:46 AM Padarn Wilson ***@***.***>
wrote:
> Okay I understand the problem you describe now that I see your model.
>
> I guess if you really want to continue with this model, the easiest thing
> to do would be to use a very large number rather than inf. You could also
> filter out any inf values before you calculate loss.
>
> What I was suggesting is that your anchor nodes can be also considered
part
> of the graph instead of building features in x, you can use the graph
> structure of your problem.
>
>
>
> On Sat, 14 May 2022, 5:35 pm JiaruiWang, ***@***.***> wrote:
>
> > By the way, the model won't get better than accuracy for
> > Train: 0.0244, Val: 0.0532, Test: 0.0519
> > It's too bad. Where did I do wrong?
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <
>
#4641 (reply in thread)
> >,
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AAGRPNZCBRJS7CM76QHQC5DVJ5XWRANCNFSM5V5BRQDQ
> >
> > .
> > You are receiving this because you commented.Message ID:
> > ***@***.***
> > com>
> >
>
> --
>
>
> By communicating with Grab Inc and/or its subsidiaries, associate
> companies and jointly controlled entities (“Grab Group”), you are deemed
> to
> have consented to the processing of your personal data as set out in the
> Privacy Notice which can be viewed at https://grab.com/privacy/
> <https://grab.com/privacy/>
>
>
> This email contains confidential information
> and is only for the intended recipient(s). If you are not the intended
> recipient(s), please do not disseminate, distribute or copy this email
> Please notify Grab Group immediately if you have received this by mistake
> and delete this email from your system. Email transmission cannot be
> guaranteed to be secure or error-free as any information therein could be
> intercepted, corrupted, lost, destroyed, delayed or incomplete, or
contain
> viruses. Grab Group do not accept liability for any errors or omissions
in
> the contents of this email arises as a result of email transmission. All
> intellectual property rights in this email and attachments therein shall
> remain vested in Grab Group, unless otherwise provided by law.
>
> —
> Reply to this email directly, view it on GitHub
> <
#4641 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ADEMCKW4VEX34KZ77QUGT5LVJ5ZANANCNFSM5V5BRQDQ
>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***
> com>
>
—
Reply to this email directly, view it on GitHub
<#4641 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGRPN45I7ZC3JIJI34GEL3VJ5ZU3ANCNFSM5V5BRQDQ>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
--
By communicating with Grab Inc and/or its subsidiaries, associate
companies and jointly controlled entities (“Grab Group”), you are deemed to
have consented to the processing of your personal data as set out in the
Privacy Notice which can be viewed at https://grab.com/privacy/
<https://grab.com/privacy/>
This email contains confidential information
and is only for the intended recipient(s). If you are not the intended
recipient(s), please do not disseminate, distribute or copy this email
Please notify Grab Group immediately if you have received this by mistake
and delete this email from your system. Email transmission cannot be
guaranteed to be secure or error-free as any information therein could be
intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain
viruses. Grab Group do not accept liability for any errors or omissions in
the contents of this email arises as a result of email transmission. All
intellectual property rights in this email and attachments therein shall
remain vested in Grab Group, unless otherwise provided by law.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am training a GraphSAGE model on a directed graph. The node features are in, out, and undirected distances to some anchor nodes. Some nodes are not reachable to the anchor nodes as a directed graph (in or out direction), the respect distances are
float('inf')
.During the training, the loss is
nan
from the first epoch.If I remove the in and the out distance features, the loss is not nan anymore. But the in and the out direction distance are important for me.
Is there a workaround to avoid
nan
loss while keeping the feature information for the unreachable node distances?Thank you very much
Beta Was this translation helpful? Give feedback.
All reactions