Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 42c04d2

Browse filesBrowse files
committed
word2vec cheat sheet added to the notebook
1 parent aeb37c9 commit 42c04d2
Copy full SHA for 42c04d2

File tree

1 file changed

+31
-29
lines changed
Filter options

1 file changed

+31
-29
lines changed

‎01_natural_language_processing_fundamentals/01_01_word2vec_with_sub_sampling_and_negative_sampling_in_pytorch.ipynb

Copy file name to clipboardExpand all lines: 01_natural_language_processing_fundamentals/01_01_word2vec_with_sub_sampling_and_negative_sampling_in_pytorch.ipynb
+31-29Lines changed: 31 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@
2626
"[![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/SkalskiP/vlms-zero-to-hero)\n",
2727
"[![arXiv](https://img.shields.io/badge/arXiv-1310.4546-b31b1b.svg)](https://arxiv.org/abs/1310.4546)\n",
2828
"\n",
29+
"![word2vec-cheat-sheet](https://github.com/user-attachments/assets/aedfbbe9-c645-4818-9c9b-c9f1ba04d522)\n",
30+
"\n",
2931
"This notebook introduces Word2Vec, a powerful method for understanding the relationships between words by learning their \"distributed representations.\" Originally proposed by Mikolov et al. in their influential paper \"Distributed Representations of Words and Phrases and Their Compositionality\", Word2Vec has become a cornerstone of natural language processing (NLP). By representing words as vectors in a high-dimensional space, Word2Vec captures both semantic (meaning-based) and syntactic (grammar-based) relationships, enabling applications like machine translation, sentiment analysis, and text similarity.\n",
3032
"\n",
3133
"In this notebook, we’ll walk through every step of building and training the Word2Vec model using the Skip-Gram architecture. We'll start by preparing the dataset, learning how to handle common issues like overly frequent words, and explore how to create training samples. Using negative sampling—a key optimization trick introduced in the original paper—we'll efficiently train our model on large text data. Finally, we’ll evaluate the learned word vectors by finding similar words and visualizing them in 2D with t-SNE. Whether you’re new to NLP or looking for a practical introduction to Word2Vec, this notebook offers a hands-on way to understand one of the most important ideas in NLP.\n",
@@ -52,7 +54,7 @@
5254
"metadata": {
5355
"id": "HOUEFSssaXGD"
5456
},
55-
"execution_count": 1,
57+
"execution_count": null,
5658
"outputs": []
5759
},
5860
{
@@ -87,7 +89,7 @@
8789
"metadata": {
8890
"id": "S-5eGpS3yjwx"
8991
},
90-
"execution_count": 2,
92+
"execution_count": null,
9193
"outputs": []
9294
},
9395
{
@@ -113,7 +115,7 @@
113115
"id": "G0Oam2UXmwP5",
114116
"outputId": "693a3029-b21b-468a-fa97-e81c2f74295d"
115117
},
116-
"execution_count": 3,
118+
"execution_count": null,
117119
"outputs": [
118120
{
119121
"output_type": "stream",
@@ -190,7 +192,7 @@
190192
"id": "GE2HFxz9ohvd",
191193
"outputId": "b302dac1-234a-43e5-9153-ef764e79a71f"
192194
},
193-
"execution_count": 4,
195+
"execution_count": null,
194196
"outputs": [
195197
{
196198
"output_type": "stream",
@@ -214,7 +216,7 @@
214216
"id": "ON3uBUvPz4fh",
215217
"outputId": "d5acca2a-5cce-4733-a778-fd5aee994634"
216218
},
217-
"execution_count": 5,
219+
"execution_count": null,
218220
"outputs": [
219221
{
220222
"output_type": "stream",
@@ -254,7 +256,7 @@
254256
"metadata": {
255257
"id": "_Kt05pMEJzuf"
256258
},
257-
"execution_count": 6,
259+
"execution_count": null,
258260
"outputs": []
259261
},
260262
{
@@ -269,7 +271,7 @@
269271
"metadata": {
270272
"id": "bNqWJI9e2uWv"
271273
},
272-
"execution_count": 7,
274+
"execution_count": null,
273275
"outputs": []
274276
},
275277
{
@@ -286,7 +288,7 @@
286288
"id": "86nMifG_MCDc",
287289
"outputId": "dc061280-fd48-4a86-eece-140717da4744"
288290
},
289-
"execution_count": 8,
291+
"execution_count": null,
290292
"outputs": [
291293
{
292294
"output_type": "stream",
@@ -330,7 +332,7 @@
330332
"metadata": {
331333
"id": "Yvp7PFyp42Uk"
332334
},
333-
"execution_count": 9,
335+
"execution_count": null,
334336
"outputs": []
335337
},
336338
{
@@ -349,7 +351,7 @@
349351
"id": "aucpp0irrRLE",
350352
"outputId": "fa15d8dd-f5a5-45ba-8ca5-ecaa65e473f6"
351353
},
352-
"execution_count": 10,
354+
"execution_count": null,
353355
"outputs": [
354356
{
355357
"output_type": "stream",
@@ -374,7 +376,7 @@
374376
"id": "fcAZgAwq8ITj",
375377
"outputId": "3e3e8128-49be-4129-ad41-8992c7a3a0a2"
376378
},
377-
"execution_count": 11,
379+
"execution_count": null,
378380
"outputs": [
379381
{
380382
"output_type": "stream",
@@ -430,7 +432,7 @@
430432
"id": "mteiUA79JXS4",
431433
"outputId": "91a3a20a-6513-4633-d855-566fb12077d6"
432434
},
433-
"execution_count": 12,
435+
"execution_count": null,
434436
"outputs": [
435437
{
436438
"output_type": "stream",
@@ -485,7 +487,7 @@
485487
"metadata": {
486488
"id": "GH456SP3wKBN"
487489
},
488-
"execution_count": 13,
490+
"execution_count": null,
489491
"outputs": []
490492
},
491493
{
@@ -511,7 +513,7 @@
511513
"metadata": {
512514
"id": "wCcRlq2ZJpiR"
513515
},
514-
"execution_count": 14,
516+
"execution_count": null,
515517
"outputs": []
516518
},
517519
{
@@ -526,7 +528,7 @@
526528
"id": "v7EQhXpZJr-i",
527529
"outputId": "b597fd64-0e7d-4837-8b99-b566871176d6"
528530
},
529-
"execution_count": 15,
531+
"execution_count": null,
530532
"outputs": [
531533
{
532534
"output_type": "execute_result",
@@ -552,7 +554,7 @@
552554
"id": "bPZRDgPNJuop",
553555
"outputId": "f2292d4f-845b-452a-dddc-8b3b3c43b808"
554556
},
555-
"execution_count": 16,
557+
"execution_count": null,
556558
"outputs": [
557559
{
558560
"output_type": "execute_result",
@@ -588,7 +590,7 @@
588590
"metadata": {
589591
"id": "dt2cynm9G3Q5"
590592
},
591-
"execution_count": 17,
593+
"execution_count": null,
592594
"outputs": []
593595
},
594596
{
@@ -609,7 +611,7 @@
609611
"id": "UDleRBIy4LxW",
610612
"outputId": "84046955-abca-45b4-cfaa-980ecb56ea4d"
611613
},
612-
"execution_count": 18,
614+
"execution_count": null,
613615
"outputs": [
614616
{
615617
"output_type": "stream",
@@ -675,7 +677,7 @@
675677
"metadata": {
676678
"id": "gcUjF0ewHrQO"
677679
},
678-
"execution_count": 19,
680+
"execution_count": null,
679681
"outputs": []
680682
},
681683
{
@@ -709,7 +711,7 @@
709711
"id": "Q2o8dD5BPbu7",
710712
"outputId": "5f0126a4-79c7-4d5b-d0b2-589ccae4cd75"
711713
},
712-
"execution_count": 20,
714+
"execution_count": null,
713715
"outputs": [
714716
{
715717
"output_type": "stream",
@@ -777,7 +779,7 @@
777779
"metadata": {
778780
"id": "Zy34kv1hS1Gm"
779781
},
780-
"execution_count": 21,
782+
"execution_count": null,
781783
"outputs": []
782784
},
783785
{
@@ -806,7 +808,7 @@
806808
"metadata": {
807809
"id": "BBuZPBc5bB9a"
808810
},
809-
"execution_count": 22,
811+
"execution_count": null,
810812
"outputs": []
811813
},
812814
{
@@ -826,7 +828,7 @@
826828
"metadata": {
827829
"id": "prSzVTX7jBmZ"
828830
},
829-
"execution_count": 23,
831+
"execution_count": null,
830832
"outputs": []
831833
},
832834
{
@@ -908,7 +910,7 @@
908910
"id": "u9HXcoYXOdM7",
909911
"outputId": "a01b1227-083c-470f-871b-289068a893ae"
910912
},
911-
"execution_count": 24,
913+
"execution_count": null,
912914
"outputs": [
913915
{
914916
"output_type": "stream",
@@ -1068,7 +1070,7 @@
10681070
"id": "s35CyixLd9L_",
10691071
"outputId": "99cd808d-2111-48e1-8474-664a658f4572"
10701072
},
1071-
"execution_count": 25,
1073+
"execution_count": null,
10721074
"outputs": [
10731075
{
10741076
"output_type": "display_data",
@@ -1105,7 +1107,7 @@
11051107
"metadata": {
11061108
"id": "vhBtbToF-dLD"
11071109
},
1108-
"execution_count": 26,
1110+
"execution_count": null,
11091111
"outputs": []
11101112
},
11111113
{
@@ -1132,7 +1134,7 @@
11321134
"id": "iYR_PsW1RPVn",
11331135
"outputId": "8f3e8da7-4bab-4b69-c10e-25fab39807fe"
11341136
},
1135-
"execution_count": 27,
1137+
"execution_count": null,
11361138
"outputs": [
11371139
{
11381140
"output_type": "stream",
@@ -1224,7 +1226,7 @@
12241226
"metadata": {
12251227
"id": "jhlfZDEqS5Q8"
12261228
},
1227-
"execution_count": 28,
1229+
"execution_count": null,
12281230
"outputs": []
12291231
},
12301232
{
@@ -1272,7 +1274,7 @@
12721274
"id": "xOx7PjxBfoL5",
12731275
"outputId": "84d00d15-2f71-4904-e845-9cc83fdf4e3c"
12741276
},
1275-
"execution_count": 31,
1277+
"execution_count": null,
12761278
"outputs": [
12771279
{
12781280
"output_type": "display_data",

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.