Twitter: gaining insights from Tweets with an API for Google Cloud

Editor’s note: Although Twitter has long been considered a treasure trove of data, the task of analyzing Tweets in order to understand what’s happening in the world, what people are talking about right now, and how this information can support business use cases has historically been highly technical and time-consuming. Not anymore. Twitter recently launched an API toolkit for Google Cloud which helps developers to harness insights from Tweets, at scale, within minutes. This blog is based on a conversation with the Twitter team who’ve made this possible. The authors would like to thank Prasanna Selvaraj and Nikki Golding from Twitter for contributions to this blog. Businesses and brands consistently monitor Twitter for a variety of reasons: from tracking the latest consumer trends and analyzing competitors, to staying ahead of breaking news and responding to customer service requests. With 229 million monetizable daily active users, it’s no wonder companies, small and large, consider Twitter a treasure trove of data with huge potential to support business intelligence. But language is complex, and the journey towards transforming social media conversations into insightful data involves first processing large amounts of Tweets by ways of organizing, sorting, and filtering them. Crucial to this process are Twitter APIs: a set of programmatic endpoints that allow developers to find, retrieve, and engage with real-time public conversations happening on the platform. In this blog, we learn from the Twitter Developer Platform Solutions Architecture team about the Twitter API toolkit for Google Cloud, a new framework for quickly ingesting, processing, and analyzing high volumes of Tweets to help developers harness the power of Twitter. Making it easier for developers to surface valuable insights from Tweets Two versions of the toolkit are currently available: The Twitter API Toolkit for Google Cloud Filtered Stream and the Twitter API Toolkit for Google Cloud Recent Search.The Twitter API for Google Cloud for Filtered Stream supports developers with a trend detection framework that can be installed on Google Cloud in 60 minutes or less. It automates the data pipeline process to ingest Tweets into Google Cloud, and offers visualization of trends in an easy-to-use dashboard that illustrates real-time trends for configured rules as they unfold on Twitter. This tool can be used to detect macro- and micro-level trends across domains and industry verticals, and can horizontally scale and process millions of Tweets per day. “Detecting trends from Twitter requires listening to real-time Twitter APIs and processing Tweets on the fly,” explains Prasanna Selvaraj, Solutions Architect at Twitter and author of this toolkit. “And while trend detection can be complex work, in order to categorize trends, tweet themes and topics must also be identified. This is another complex endeavor as it involves integrating with NER (Named Entity Recognition) and/or NLP (Natural Language Processing) services. This toolkit helps solve these challenges.”Meanwhile, the Twitter API for Google Cloud Recent Search returns Tweets from the last seven days that match a specific search query. “Anyone with 30 minutes to spare can learn the basics about this Twitter API and, as a side benefit, also learn about Google Cloud Analytics and the foundations of data science,” says Prasanna. The toolkits leverage Twitter’s new API v2 (Recent Search & Filtered Stream) and use BigQuery for tweet storage, Data Studio for business intelligence and visualizations, and App Engine for data pipeline on the Google Cloud Platform. “We needed a solution that is not only serverless but also can support multi-cardinality, because all Twitter APIs that return Tweets provide data encoded using JavaScript Object Notation (JSON). This has a complex structure, and we needed a database that can easily translate it into its own schema. BigQuery is the perfect solution for this,” says Prasanna. “Once in BigQuery, one can visualize that data in under 10 minutes with Data Studio, be it in a graphic, spreadsheet, or Tableau form. This eliminates friction in Twitter data API consumption and significantly improves the developer experience.” Accelerating time to value from 60 hours to 60 minutesHistorically, Twitter API developers have often grappled with processing, analyzing, and visualizing a higher volume of Tweets to derive insights from Twitter data. They’ve had to build data pipelines, select storage solutions, and choose analytics and visualization tools as the first step before they can start validating the value of Twitter data. “The whole process of choosing technologies and building data pipelines to look for insights that can support a business use case can take more than 60 hours of a developer’s time,” explains Prasanna. “And after investing that time in setting up the stack they still need to sort through the data to see if what they are looking for actually exists.”Now, the toolkit enables data processing automation at the click of a button because it provisions the underlying infrastructure it needs to work, such as BigQuery as a database and the compute layer with App Engine. This enables developers to install, configure, and visualize Tweets in a business intelligence tool using Data Studio in less than 60 minutes.“While we have partners who are very well equipped to connect, consume, store, and analyze data, we also collaborate with developers from organizations who don’t have a myriad of resources to work with. This toolkit is aimed at helping them to rapidly prototype and realize value from Tweets before making a commitment,” explains Nikki Golding, Head of Solutions Architecture at Twitter.Continuing to build what’s next for developersAs they collaborated with Google Cloud to bring the toolkit to life, the Twitter team started to think about what public datasets exist within the Google Cloud Platform and how they can complement some of the topics that Twitter has a lot of conversations about, from crypto to weather. “We thought, what are some interesting ways developers can access and leverage what both platforms have to offer?” shares Nikki. “Twitter data on its own has high value, but there’s also data that is resident in Google Cloud Platform that can further support users of the toolkit. The combination of Google Cloud Platform infrastructure and application as a service with Twitter’s data as a service is the vision we’re marching towards.”Next, the Twitter team aims to place these data analytics tools in the hands of any decision-maker, both in technical and non-technical teams. “To help brands visualize, slice, and dice data on their own, we’re looking at self-serve tools that are tailored to the non-technical person to democratize the value of data across organizations,” explains Nikki. “Google Cloud was the platform that allowed us to build the easiest low-code solution relative to others in the market so far, so our aim is to continue collaborating with Google Cloud to eventually launch a no-code solution that helps people to find the content and information they need without depending on developers. Watch this space!”Related ArticleSmooth sailing: The resource hierarchy for adopting Google Cloud BigQuery across TwitterTo provide one-to-one mapping from on-prem Hadoop to BigQuery, the Google Cloud and Twitter team created this resource hierarchy architec…Read Article
Quelle: Google Cloud Platform

Earn Google Cloud swag when you complete the #LearnToEarn challenge

The MLOps market is expected to grow to around $700m by 20251. With the Google Cloud Professional Data Engineer certification topping the list of highest paying IT certifications in 20212, there has never been a better time to grow your data and ML skills with Google Cloud. Introducing the Google Cloud #LearnToEarn challenge Starting today, you’re invited to join the data and ML #LearnToEarn challenge- a high-intensity workout for your brain.  Get the ML, data, and AI skills you need to drive speedy transformation in your current and future roles with no-cost access to over 50 hands-on labs on Google Cloud Skills Boost. Race the clock with players around the world, collect badges, and earn special swag! How to complete the #LearnToEarn challenge?The challenge will begin with a core data analyst learning track. Then each week you’ll get new tracks designed to help you explore a variety of career paths and skill sets. Keep an eye out for trivia and flash challenges too!  As you progress through the challenge and collect badges, you’ll qualify for rewards at each step of your journey. But time and supplies are limited – so join today and complete by July 19! What’s involved in the challenge? Labs range from introductory to expert level. You’ll get hands-on experience with cutting edge tech like Vertex AI and Looker, plus data differentiators like BigQuery, Tensorflow, integrations with Workspace, and AutoML Vision. The challenge starts with the basics, then gets gradually more complex as you reach each milestone. One lab takes anywhere from ten minutes to about an hour to complete. You do not have to finish all the labs at once – but do keep an eye on start and end dates. Ready to take on the challenge?Join the #LearnToEarn challengetoday!1. IDC, Market Analysis Perspective: Worldwide AI Life-Cycle Software, September 20212. Skillsoft Global Knowledge, 15 top-paying IT certifications list 2021, August 2021
Quelle: Google Cloud Platform

Docker Captain Take 5 – Damian Naprawa

Docker Captains are select community members that are both experts in their field and passionate about sharing their Docker knowledge with others. “Docker Captains Take 5” is a regular blog series where we get a closer look at our Captains and ask them the same broad set of questions, ranging from what their best Docker tip is to whether they prefer cats or dogs (personally, we like whales and turtles over here). Today, we’re interviewing Damian Naprawa, who recently became a Docker Captain. He’s a Software Architect at Capgemini and is based in Mielec, Poland.

How/when did you first discover Docker?
It was a long time ago! 
For the first time, I saw some blog posts about Docker and also participated in Docker introductory workshops (thanks to Bart & Dan!). However, I do remember that at the beginning I couldn’t understand how it works and what the benefits are from the developer’s perspective. Since I always want to not only use, but also understand how the technology I use works under the hood, I spent a lot of time on learning and practicing. 
After some time, the “aha” moment happened. I remember telling myself, “That’s awesome!”
After a couple of years, I decided to launch my own blog dedicated to the Polish community: szkoladockera.pl (in English it means “Docker School”). I wanted to help others understand Docker and containers, and hoped to share this great technology across the Polish community. I still remember how difficult it was for me – before that “aha” moment came, and before I started to know what I’m doing while using docker run commands.
What is your favorite Docker command?
It used to be docker exec (to see the container file system or for debugging purposes), but now the winner is docker sbom.
Why? Because one of my top interests is container security. 
With docker sbom, I can see every installed package inside my final Docker image – which I couldn’t see before. Every time we use a FROM command in the Dockerfile, we’re referring to some base image. In most cases, we don’t create them ourselves, and we aren’t aware of what packages are installed on an OS level (like curl) and application level (like Log4j). There could be a lot of packages that your app doesn’t need anymore, and you should be aware of that.
What is your top tip for working with Docker that others may not know?
Using Docker in combination with Ngrok lets developers expose their containerized, microservices-based apps to the internet directly from their machines. It’s very helpful when we want to present what code changes we made to our teammates, stakeholders, and clients, plus how it works from a user perspective – without needing to build and publish the app in the destination environment. You can find an example here.
What’s the coolest Docker demo you have done/seen?
I have seen and done a lot of demos. However, if I need to mention just one, there’s one I’m really proud of.

In 2021, I organized an online conference for the Polish community called “Docker & Kubernetes Festival”. During that, I held a talk called  “Docker for Developers”, where I presented quite a large amount of tips for working with Docker and how to speed up developer productivity. 
There were around 700 Polish community members watching it live and thousands who watched the recording.
What have you worked on in the past six months that you’re particularly proud of?
I’ve been working closely with developer teams on containerizing microservices-based apps written in Java and Python (ML). Since I used to code mostly with JavaScript and the .NET platform, it was a very interesting experience. I had to dive deeply into the Java and Python code to understand architecture and implementation details. I then advised developers on refactoring the code and smoothly migrating to containers.
What do you anticipate will be Docker’s biggest announcement this year?
Docker SBOM. It’s a game changer for me to have an overview of packages installed in my final docker image both on OS-level (like curl) and application level (like Log4j)
What are some personal goals for the next year with respect to the Docker community ?
I’d like to share more knowledge on my blog about specific technologies (like NestJS, Java, Python etc.) – how to prepare the Dockerfiles using best practices, and how to refactor apps to smoothly migrate them into containers.
What was your favorite thing about DockerCon 2022?
Since I’m working closely with development teams, everything related to microservices and speeding up developer productivity.
Looking to the distant future, what is the technology that you’re most excited about and that you think holds a lot of promise?
Containers, of course! I do see a huge demand for container experts and I predict this demand will increase. While speaking with the clients or with my students (of online courses), I’ve learned that companies have started to appreciate the benefits of containers, and they just want them in their workflows. 
Apart from that, I’m excited about web3 and NFT. I guess there’ll also be a demand for blockchain/web3 developers and security specialists in the next few years.
Rapid fire questions…
What new skill have you mastered during the pandemic?
I gave a lot of online demos and conducted a lot of webinars, but now I’m really keen to meet with people offline! I also started my podcast, More Than Containers, but I need to go back to regular recordings!
Cats or Dogs?
Both!
Salty, sour or sweet?
Salty. Nobody believes me, but I can live without sweet ???? 
Beach or mountains?
I love to travel, discover new things, and visit new places. Life is too short to choose between beach and mountains ????
Your most often used emoji?
Captain emoji! 
Quelle: https://blog.docker.com/feed/

AWS Direct Connect fügt Unterstützung für alle AWS Local Zones in den Vereinigten Staaten hinzu

Heute hat AWS AWS Direct Connect die Unterstützung für alle AWS Local Zones in den Vereinigten Staaten angekündigt. Ihr Netzwerkverkehr nutzt nun den kürzesten Weg zwischen Direct Connect Point of Presence (PoP)-Standorten und AWS-Ressourcen, die in Local Zones ausgeführt werden. Diese Funktion reduziert die Distanz, die Netzwerkverkehr zurücklegen muss, verringert die Latenz und verbessert die Reaktionsstärke von Anwendungen.
Quelle: aws.amazon.com

Amazon SageMaker Ground Truth unterstützt jetzt die synthetische Datengeneration

Wir freuen uns, ankündigen zu können, dass Amazon SageMaker Ground Truth jetzt Unterstützung für die Generierung beschrifteter synthetischer Daten bereitstellt, ohne dass Sie große Mengen an manuell beschrifteten Daten aus der echten Welt sammeln müssen. Amazon SageMaker bietet zwei Angebote zur Datenbeschriftung: Amazon SageMaker Ground Truth Plus und Amazon SageMaker Ground Truth. Mit beiden Optionen können Sie Rohdaten wie Bilder, Textdateien und Videos identifizieren und informative Beschriftungen hinzufügen, um hochwertige Trainings-Datensätze für Ihre Machine-Learning-Modelle (ML) zu erstellen.
Quelle: aws.amazon.com