Data Engineering and coding related posts.
4 reasons for Agile in Analytics
Working in several projects as Data Science consultant I’ve realized about the need of spreading the word about project planning in this field. This is neither an Agile apology nor an open letter criticizing project managers (PMs) that prefer other methodologies. It’s more a post...
cetrulin,
6 years ago
3 min read
Struggling with Hive… What can I do?
If you are in a Big Data project, you may have experienced how slow is Hive to JOIN a couple of tables of few TBs (well, even GBs being honest). The first option always appears to be using PARQUET as your default storage engine and...
cetrulin,
7 years ago
3 min read
Working in a Big Data Project using the terminal
So, you are just landing in a big data project. Everybody knows how to use HDFS except you. All the data is in such a big cluster and you don’t know how to access to it. You are not really into graphic interfaces, so you...
cetrulin,
7 years ago
4 min read
Flattening complex XML structures into Hive tables using Spark DFs
A couple of months ago in work we faced an issue where we got XML files with nested structs in structs and arrays (with also structs in them). Normally we always face these issues in Hive. Our ETL guy ingests the XML in HDFS in...
cetrulin,
7 years ago
9 min read
Stay connected