matt godbolt

@mattgodbolt
9 Followers
68 Following
17 Posts
Sometime verb, lover of old hardware and getting deep understanding of what's going on in hardware. Black Lives Matter. Trans rights are human rights.
Bloghttps://xania.org
Most know forhttps://godbolt.org
But alsohttps://bbc.godbolt.org
Compiler Explorer now supports some CUDA code execution on a real GPU: https://godbolt.org/z/a6rzTbKeM - let me know what issues you find :)
Compiler Explorer - CUDA C++ (NVCC 11.7.0)

/* Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of NVIDIA CORPORATION nor the names of its * contributors may be used to endorse or promote products derived * from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /** * Vector addition: C = A + B. * * This sample is a very basic sample that implements element by element * vector addition. It is the same as the sample illustrating Chapter 2 * of the programming guide with some additions like error checking. */ #include <stdio.h> // For the CUDA runtime routines (prefixed with "cuda_") #include <cuda_runtime.h> #include "helper_cuda.h" /** * CUDA Kernel Device code * * Computes the vector addition of A and B into C. The 3 vectors have the same * number of elements numElements. */ __global__ void vectorAdd(const float *A, const float *B, float *C, int numElements) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < numElements) { C[i] = A[i] + B[i] + 0.0f; } } /** * Host main routine */ int main(void) { // Error code to check return values for CUDA calls cudaError_t err = cudaSuccess; // Print the vector length to be used, and compute its size int numElements = 5000; size_t size = numElements * sizeof(float); printf("[Vector addition of %d elements]\n", numElements); // Allocate the host input vector A float *h_A = (float *)malloc(size); // Allocate the host input vector B float *h_B = (float *)malloc(size); // Allocate the host output vector C float *h_C = (float *)malloc(size); // Verify that allocations succeeded if (h_A == NULL || h_B == NULL || h_C == NULL) { fprintf(stderr, "Failed to allocate host vectors!\n"); exit(EXIT_FAILURE); } // Initialize the host input vectors for (int i = 0; i < numElements; ++i) { h_A[i] = rand() / (float)RAND_MAX; h_B[i] = rand() / (float)RAND_MAX; } // Allocate the device input vector A float *d_A = NULL; err = cudaMalloc((void **)&d_A, size); if (err != cudaSuccess) { fprintf(stderr, "Failed to allocate device vector A (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Allocate the device input vector B float *d_B = NULL; err = cudaMalloc((void **)&d_B, size); if (err != cudaSuccess) { fprintf(stderr, "Failed to allocate device vector B (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Allocate the device output vector C float *d_C = NULL; err = cudaMalloc((void **)&d_C, size); if (err != cudaSuccess) { fprintf(stderr, "Failed to allocate device vector C (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Copy the host input vectors A and B in host memory to the device input // vectors in // device memory printf("Copy input data from the host memory to the CUDA device\n"); err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice); if (err != cudaSuccess) { fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice); if (err != cudaSuccess) { fprintf(stderr, "Failed to copy vector B from host to device (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Launch the Vector Add CUDA Kernel int threadsPerBlock = 256; int blocksPerGrid = (numElements + threadsPerBlock - 1) / threadsPerBlock; printf("CUDA kernel launch with %d blocks of %d threads\n", blocksPerGrid, threadsPerBlock); vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements); err = cudaGetLastError(); if (err != cudaSuccess) { fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Copy the device result vector in device memory to the host result vector // in host memory. printf("Copy output data from the CUDA device to the host memory\n"); err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost); if (err != cudaSuccess) { fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Verify that the result vector is correct for (int i = 0; i < numElements; ++i) { if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5) { fprintf(stderr, "Result verification failed at element %d!\n", i); exit(EXIT_FAILURE); } } printf("Test PASSED\n"); // Free device global memory err = cudaFree(d_A); if (err != cudaSuccess) { fprintf(stderr, "Failed to free device vector A (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } err = cudaFree(d_B); if (err != cudaSuccess) { fprintf(stderr, "Failed to free device vector B (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } err = cudaFree(d_C); if (err != cudaSuccess) { fprintf(stderr, "Failed to free device vector C (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Free host memory free(h_A); free(h_B); free(h_C); printf("Done\n"); return 0; }

Second read through, I love this book :)
EDIT: DO NOT BOOST!

This post is nearly a year old, and boosting will simply spread misinformation. Mastodon now has substantially more funding, and has enough scale to handle its many users.

I would delete this post but my server is not letting me so I'm editing this to reflect that I wish for you to NOT BOOST!

---
Right now Mastodon is only receiving appr. $21,000/month through Patreon.

This is not enough to handle the 1 million new accounts that will be made this week.

Currently, only 4,720 patrons are donating to Mastodon.

However, if everyone chips in $2/month, this will ensure the continued survival of Mastodon!

Be a hero! Donate now! https://www.patreon.com/mastodon
Get more from Mastodon on Patreon

Creating Mastodon

Patreon

Wherein I propose that C++ initialize all stack variables to zero, preventing ~10% of CVEs.

Cost: none.

🔗 https://wg21.link/P2723R0 🔗

P2723R0: Zero-initialize objects of automatic storage duration

If you download your #Twitter archive it arrives wrapped as a static HTML page, which is not very useful for doing anything with, and worse: it requires the original account to be still active to do useful things like enlarge the images since they use t.co links.

So here's a #Python script to convert a Twitter archive to #markdown or other formats: https://github.com/timhutton/twitter-archive-parser

Now you can archive your tweets in any way you want.

GitHub - timhutton/twitter-archive-parser: Python code to parse a Twitter archive and output in various ways

Python code to parse a Twitter archive and output in various ways - timhutton/twitter-archive-parser

GitHub
The unexpected pleasure of discovering a weird problem you were trying to solve is probably some kind of NP and you're not actually rubbish after all
I just figured out what creating a Mastodon account (and picking a server) reminded me of: An unfamiliar video game where the first thing you have to do is choose character attributes, but you have no idea what effects they have.

I promised something good for #BandcampFriday so here it is - you can now order a #chiptune Pretty Eight Machine #vinyl LP direct from me (There are records at stores in Baltimore, Portland, Nyack, and York/Lancaster tho if you want to get one local!)

Check the first item at http://inversephase.bandcamp.com/merch

Inverse Phase

Chiptune/8-bit music and indie game composer. I also run an electronic entertainment museum called Bloop.

Inverse Phase
Alright, still working this out. Was recommended https://wordsmith.social/elilla/a-futuristic-mastodon-introduction-for-2021 to start ... and it seems I've fallen at the first hurdle in making a @mastodon.social account :D
A futuristic Mastodon introduction for 2021:

Focusing on things that come up frequently and I don’t see explained that often. Here’s the lede: You can’t ever see or search everythin...

elilla & friends’ very occasional blog thing
So, this is clunky twitter!