Cristian Cardellino
Notes of a Computer Scientist
Featured
Spanish Billion Words Corpus and Embeddings
So, a year and a half since my last post. Even if I kind of update my page to be a blog from the root, shame on me.
This blog post however is not something related to what I did in the previous ones. I promise someday I will continue with my Python to Scala tutorials, but for now you’ll have to settle with this.
Since I am a PhD Student in Natural Language Processing and a native speaker of the Spanish language, I like to do my research in this language. The problem is that Spanish, unlike English, doesn’t...
Regular
Python with ggplot
Jupyter Notebooks combining Python and R
(Original post available at Medium. This is for archiving.)
Disclaimer: This post assumes you have some familiarity with ggplot2 (and, of course, Python, R, and Jupyter). If you need a quick catch up with the ggplot2 library I recommend ZevRoss cheatsheet.
Spanish Billion Words Corpus and Embeddings
So, a year and a half since my last post. Even if I kind of update my page to be a blog from the root, shame on me.
This blog post however is not something related to what I did in the previous ones. I promise someday I will continue with my Python to Scala tutorials, but for now you’ll have to settle with this.
Since I am a PhD Student in Natural Language Processing and a native speaker of the Spanish language, I like to do my research in this language. The problem is that Spanish, unlike English, doesn’t have that many resources.
In the last year I have been working and researching in the fields of deep learning and word embeddings. The problem with word embeddings, specially with those generated by neural networks methods like word2vec, is that they require great amount of unannotated data.
Most of the works I have seen to create Spanish word embeddings use the Wikipedia, which is a big corpus, but not that big, so I decided to contribute to the world of word embeddings by first releasing a corpus big enough to train some decent word embeddings, and then by releasing some embeddings created on my own.
This is why I am releasing now the Spanish Billion Words Corpus and Embeddings, a resource for the Spanish language that offers a big corpus (of nearly 1.5 billion words) and a set of word vectors (or embeddings) trained from this corpus.
Feel free to use it as it is released under a Creative Commons BY-SA license.
From Python to Scala (VII): Functions (II)
Hello again! Nice to see you decided to come back. If you check my previous post you know that functions are quite an important matter in the Scala language.
Last time, talking about recursion, I wasn’t able to cover all the topics about functions. So I decided to dedicate yet another post to it. You can call it “advanced functions”, but I don’t think is so “advance” what I’m going to show here.
You are welcome to read some more on functions in this new blog post.
Arguments
Default Values
Following the Python Tutorial, I’ll talk a little about this.
Default argument values in Scala are very similar to Python’s. With the difference being in the static types, that is, you’ll have to explicit declare the type of the argument:
As you can see, there is no problem in how to send the arguments, but if you don’t explicitly tell what parameter you are passing, it will use the order to define the assignments.
From Python to Scala (VI): Functions
Welcome to another post on my series of tutorials. As you can see (if you were following my tutorials since I started them), I change the environment of my blog, using Octopress to facilitate the blog writing (it has very nice features such as the automatic categories and blog archive).
This time we will exploring one of the most powerful things Scala offers as a functional programming language. That is, of course, the functions, the core concept in this paradigm.
This concept is quite important, and I’m sure I won’t be able to explain the full potential of Scala functions as I’m not a master in functional programming paradigm. Yet, I’ll do my best. However, it is important that you take a tutorial or course on Scala’s functional programming (I deeply recommend Martin Odersky’s Functional Programming Principles in Scala).
Functions Basics
Scala functions are declared using the same reserved word that Python uses: def. Like all Scala’s control flow instructions, the scope of the function is defined either by the immediate next instruction or by a block closed between curly braces: { and }.
I won’t be able to explain the full potential of Scala functions as I’m not a master in functional programming paradigm. Yet, I’ll do my best.
Functions in Scala are actually values assigned to a symbol (just like a val or a var), so naturally they have a type. The type of a function is defined as a list of parameters of some type returning a parameter of some type (can be the same, can be different). In basic terms, this means that every parameter of a function should have an explicit type (the system cannot infer the type on its own and will throw an error if you don’t declare it). But, they can have an implicit returning type that the system can infer:
From Python to Scala (V): Control Flow Tools
Ok, after a short period of laziness, I come back for more. I warned you about my activity, but, to be fair, it’s been a busy couple of weeks at work.
However, before starting, I wanted you to know that there is an upcoming Course for Functional Programming Principles in Scala in 25 days (starts on September 15th). You can find more information about it (or even enroll in it) at Coursera. The course is in charge of Martin Odersky, the creator of Scala, so you are in good hands.
So, back to business. On this session let’s talk about some more real programming.
Control Flow Tools
The if statement
The most basic and probably the most well known statement in programming, the conditional control flow: