Data Engineering and coding related posts.
Surviving the Promptpocalypse: A Guide to Better LLM Conversations
Lee Boonstra just released a beautiful 68-page Prompt Engineering whitepaper that I will summarize here. You’ve surely heard the buzz about LLMs and Agentic AI over the last couple of years. Think of them as that brilliant, slightly chaotic intern you once had. Super smart, tons...
cetrulin,
4 days ago
8 min read
4 reasons for Agile in Analytics
Working on several projects as a Data Science consultant, I’ve realized the need to spread the word about project planning in this field. This is neither an Agile apology nor an open letter criticizing project managers (PMs) who prefer other methodologies. It’s more of a...
cetrulin,
7 years ago
3 min read
Struggling with Hive… What can I do?
If you are in a Big Data project, you may have experienced how slow is Hive to JOIN a couple of tables of few TBs (well, even GBs being honest). The first option always appears to be using PARQUET as your default storage engine and...
cetrulin,
7 years ago
3 min read
Working in a Big Data Project using the terminal
So, you are just landing in a big data project. Everybody knows how to use HDFS except you. All the data is in such a big cluster and you don’t know how to access to it. You are not really into graphic interfaces, so you...
cetrulin,
7 years ago
4 min read
Flattening complex XML structures into Hive tables using Spark DFs
A couple of months ago in work we faced an issue where we got XML files with nested structs in structs and arrays (with also structs in them). Normally we always face these issues in Hive. Our ETL guy ingests the XML in HDFS in...
cetrulin,
8 years ago
9 min read
Stay connected