From 17b6c7ba3941432b49fbc271b057433cd2f77e0c Mon Sep 17 00:00:00 2001 From: Bob Feretich Date: Fri, 20 Jul 2018 09:55:33 -0600 Subject: [PATCH] drivers/mmcsd/mmcsd_sdio.c: Fix an error that was causing SDIO multiple block transfers from achieving full performance. Also adds a feature controlled by CONFIG_SDIO_BLK_CALLBACK that bypasses the normal mechanism for obtaining sector data for transfer. --- drivers/mmcsd/Kconfig | 10 ++ drivers/mmcsd/README.txt | 140 ++++++++++++++++++ drivers/mmcsd/mmcsd_sdio.c | 291 +++++++++++++++++++++++++++++++++++-- drivers/mmcsd/mmcsd_sdio.h | 33 ++++- 4 files changed, 457 insertions(+), 17 deletions(-) create mode 100644 drivers/mmcsd/README.txt diff --git a/drivers/mmcsd/Kconfig b/drivers/mmcsd/Kconfig index 22e6588d16..69acf32f6f 100755 --- a/drivers/mmcsd/Kconfig +++ b/drivers/mmcsd/Kconfig @@ -129,4 +129,14 @@ config SDIO_BLOCKSETUP number of blocks. Others just work on the byte stream. This option enables the block setup method in the SDIO vtable. +config SDIO_BLK_CALLBACK + bool "SDIO block callback" + depends on !DRVR_WRITEBUFFER && !MMCSD_MULTIBLOCK_DISABLE + default n + ---help--- + Some hardware is RAM constrained and therefore unable to utilize the more + efficient Multiple Block Write (CMD25) mechanism. Selecting this option + will permit the driver to perform Multiple Block Writes from as few as one + 512-byte data buffer. See nuttx/drivers/mmcsd/README.txt for details. + endif diff --git a/drivers/mmcsd/README.txt b/drivers/mmcsd/README.txt new file mode 100644 index 0000000000..7c5a960318 --- /dev/null +++ b/drivers/mmcsd/README.txt @@ -0,0 +1,140 @@ +General SD-card Operation (Bob Feretich) +========================= + +SD-cards are not passive memory devices. They contain a microcontroller, +input/output buffers, and flash memory. Data being written to an SD-card +is received by the microcontroller, CRC checked, and then placed in an input +buffer. The size of these input buffers vary, but most modern (SDHC (4GB+) and +newer) cards have at least 16 KB of input buffers and are able to store the +contents of the entire input buffer into the flash memory in one "program" +operation. This "program" operation accounts for most of the delay of the +write operation. + +When a Single Block Write (CMD24) is performed, a sector (512 bytes) of data +is transfered to the SD-card, the data is placed in an input buffer, an erased +area of flash memory is obtained/identified, and the data in the input buffer +is programmed into the flash memory. + +When a Multiple Block Write (CMD25) is performed, one or more sectors +(512 bytes each) of data are transfered to the SD-card, the data is CRC +checked and placed in the input buffer, when the buffer is full, the contents +are programmed into the flash memory. During the programming operation, +the SD-card will hold-off the transfer of additional blocks from the mmcsd +driver by becoming "busy". Then when it can accept more data, the card drops +its busy indication and the transfer of data blocks continue. When the mmcsd +driver has sent the number of blocks that it was requested to send, the mmcsd +driver issues a Stop Transmission CMD12 to the SD-card. This command causes +the SD-card to program any data remaining in the input buffer into the +flash memory. + +During a Multiple Block Write, obtaining erased flash memory areas may become +a bottleneck. (Erasing flash takes much longer than programming it.) +To eliminate this delay, the mmcsd driver issues a SET_WR_BLK_ERASE_COUNT +ACMD23 to notify the SD-card of the needed number of erased blocks in advance. + +The mmcsd driver will automatically use Multiple Block Writes when mmcsd_write +is called with nsectors greater than one, unless CONFIG_MMCSD_MULTIBLOCK_DISABLE +is defined. + +The SDIO block callback option (Bob Feretich) +============================== +The best SD-card write performance is achieved when Multiple Block Writes are +used to take advantage of the large input buffers in the SD-cards. By calling +mmcsd_write with a 16 KB buffer (32 sectors of data) I was able to consistently +achieve greater than 1.7 MB / second on a Class 10 SDHC SD-card. (stm32f7 +microcontroller with the sdio clock at 16 MHz; time is averaged over writing +100 MB of data) Performance using Single Block Writes was less than +100 KB / second. Note that new cards had much higher initial performance, but +the performance degraded to (and seems to have stabilized at) this level after +writing several gigabytes to the card. + +But even with the resources of a stm32f7 dealing with 16 KB DMA friendly +buffers is very difficult. (The stm32 DTCM memory region is 64 KB. That is +only four 16K buffers.) The designers of the SD-card architecture knew this +and made the SD-card interface flexible enough so that a microcontroller +with only a single 512-byte buffer can take advantage of Multiple Block Write +speeds. + +The feature in the SD-card interface that makes this possible is the ability +for the interface to be paused between blocks of a Multiple Block Write. +(After a block of data is transferred to the SD-card, the buffer can be +refilled and its contents transferred again as needed. Or better yet, a double +buffering architecture can be used, so while one buffer is being transferred +to the SD-card, another buffer could be being filled.) +Unfortunately, the POSIX interfaces of the operating system and the FAT +File System do not facilitate buffers being refilled or swapped in the middle +of a write operation. But there is a way to do this and make both the +operating system and the FAT File System happy. + +The POSIX write function is defined as... +ssize_t (*write)(FAR struct file *filep, FAR const char *buffer, size_t buflen); +This configuration option redefines the interpretation of from a +pointer directly to the data buffer to a pointer to a structure that contains +the callback linkage to the caller's function that provides the address of the +next block to be transferred. The itself is opaque to Nuttx. The most +Nuttx is permitted to do is check that specifies a memory address +that is valid for data transfer. + +But, why reinterpret write() when ioctl() is provided for these special +circumstances? We choose to provide a reinterpretation of write() because +file systems like fatfs do not deal with ioctl() forms of write, and this +reinterpretation permits Multiple Block Writes to utilize the file system +and therefore can generate media that can be read on other computers. + +Nuttx passes a write() upon an opened SD-card file to the fatfs file system... +ssize_t fat_write(FAR struct file *filep, FAR const char *buffer, size_t buflen); + is important for directory management and FAT cluster assignment, and +since it contains the correct length of data to be written, these functions +will perform as designed. + +If is an even multiple of 512, then the Nuttx implementation of fatfs +provides the option of data transfer directly from the user's buffer. When +this option is utilized, fatfs only checks that the address contained in + specifies a valid DMA transfer region. So if the callback linkage +structure pointed to by is located in DMA capable RAM (not +necessarily DTCM), then fatfs is happy and will pass the write to the +mmcsd driver... +ssize_t mmcsd_write(FAR struct inode *inode, FAR const unsigned char *buffer, + size_t startsector, unsigned int nsectors); + +When CONFIG_SDIO_BLK_CALLBACK is defined, then the mmcsd driver's +mmcsd_writemultiple function will interpret as a pointer to a +struct sdsector_callback_s... + typedef FAR uint8_t *(*sdsector_callback_t)( size_t ); + struct sdsector_callback_s { + sdsector_callback_t callback; + }; + +Then when the mmcsd driver is ready to transmit a sector of data to the , +SD-card it will call the provided callback function and the function is +expected to return the address of the next 512 bytes of data to be written. +If NULL is returned, then the mmcsd driver will issue a Stop Transmission +CMD12 to end the write. If the write is ended prematurely, then the blocks +already written will contain the new data, but blocks designated for erasure +may only be partially erased. + +The below measurements were made on a stm32f722 (216 MHz, 16 MHz sdio clock) +using a Patriot SDHC 4 GB SD-card. (Several GBs of data were written to the +card before it was reformatted and used for these tests. + +Sectors-per-Write Buffer_Size_Used Average_Data_Rate Longest_Delay Via_fatfs + 1 512 Bytes 50-75 KB/s* 490 ms yes + 1 512 Bytes 50-75 KB/s* 490 ms no + 32 512 Bytes 1750 KB/s 330 ms yes + 32 512 Bytes 1850 KB/s 300 ms no + 32 16K Bytes 1875 KB/s 310 ms yes + 32 16K Bytes 1925 KB/s 300 ms no +* The range shown depicts run to run variations that exceeded any noticeable + performance difference between these two rows. The single sector write tests + wrote only 10MB of data per run. Other tests wrote 100 MB of data per run. +Other measurements showed +/- 10% run to run variations. Note that new SD-card +performance was much faster than these numbers, but the performance drops off +quickly as the card is written. + +Usage notes: + +* The architecture of the SDIO adapter in the MCU must permit configuring + DMA after the CMD25 is issued. +* When using Multiple Block Writes through the file system, a Multiple Block + Write must not span a file system cluster boundary. (Cluster's in a file + may not be contiguous.) diff --git a/drivers/mmcsd/mmcsd_sdio.c b/drivers/mmcsd/mmcsd_sdio.c index 93dc1f41dd..be3ca496d2 100644 --- a/drivers/mmcsd/mmcsd_sdio.c +++ b/drivers/mmcsd/mmcsd_sdio.c @@ -1,8 +1,9 @@ /**************************************************************************** * drivers/mmcsd/mmcsd_sdio.c * - * Copyright (C) 2009-2013, 2016-2017 Gregory Nutt. All rights reserved. + * Copyright (C) 2009-2013, 2016-2018 Gregory Nutt. All rights reserved. * Author: Gregory Nutt + * Bob Feretich * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -96,7 +97,7 @@ #define MMCSD_SCR_DATADELAY (100) /* Wait up to 100MS to get SCR */ #define MMCSD_BLOCK_RDATADELAY (100) /* Wait up to 100MS to get one data block */ -#define MMCSD_BLOCK_WDATADELAY (200) /* Wait up to 200MS to write one data block */ +#define MMCSD_BLOCK_WDATADELAY (230) /* Wait up to 230MS to write one data block */ #define IS_EMPTY(priv) (priv->type == MMCSD_CARDTYPE_UNKNOWN) @@ -127,7 +128,7 @@ struct mmcsd_state_s uint8_t mode:2; /* (See MMCSDMODE_* definitions) */ uint8_t type:4; /* Card type (See MMCSD_CARDTYPE_* definitions) */ - uint8_t buswidth:4; /* Bus widthes supported (SD only) */ + uint8_t buswidth:4; /* Bus widths supported (SD only) */ sdio_capset_t caps; /* SDIO driver capabilities/limitations */ uint16_t selblocklen; /* The currently selected block length */ uint16_t rca; /* Relative Card Address (RCS) register */ @@ -143,6 +144,7 @@ struct mmcsd_state_s #else uint32_t capacity; /* Total capacity of volume (Limited to 4Gb) */ #endif + /* Read-ahead and write buffering support */ #if defined(CONFIG_DRVR_WRITEBUFFER) || defined(CONFIG_DRVR_READAHEAD) @@ -1003,6 +1005,10 @@ static int mmcsd_getR1(FAR struct mmcsd_state_s *priv, FAR uint32_t *r1) */ priv->locked = ((localR1 & MMCSD_R1_CARDISLOCKED) != 0); + + /* We must tell someone which error bits were set. */ + + fwarn("WARNING: mmcsd_getR1 returned errors: R1=%08x\n", localR1); ret = -EIO; } else @@ -1786,6 +1792,248 @@ static ssize_t mmcsd_writesingle(FAR struct mmcsd_state_s *priv, * * Description: * Write multiple, contiguous blocks of data to the physical device. + * This function expects that the data to be written will be provided + * by a user callback routine dynamically. Each time the callback + * function is called, it is expected to return with the address of the + * next sector of data to be written. + * + * This permits the system using it to work with smaller (512 byte) buffers + * and still perform efficient multi-sector mmcsd writes. + * + * Note the the SD-card spec permits delay and even card deselection + * between blocks of a CMD25 write, but the mmcsd driver will hold the + * mmcsd semaphore for this device until all promised buffer addresses + * are delivered, or the multi-sector write is canceled by the callback + * function returning a NULL pointer. + * + * If multiple tasks are accessing this device, then it is strongly + * recommended that the user callback return immediately. + * + ****************************************************************************/ + +#ifdef PERMIT_SECTOR_BUFFER_CALLBACK +#if defined(CONFIG_FS_WRITABLE) && !defined(CONFIG_MMCSD_MULTIBLOCK_DISABLE) +static ssize_t mmcsd_writemultiple(FAR struct mmcsd_state_s *priv, + FAR const uint8_t *cbparm, off_t startblock, + size_t nblocks) +{ + off_t offset; + size_t nbytes; /* Number of bytes to send on the next burst. */ + size_t curblock = 0; /* Count of blocks sent. */ + int ret; + int evret = OK; + FAR uint8_t *buffer; + FAR struct sdsector_callback_s *cbstruct + = (FAR struct sdsector_callback_s *) cbparm; + + finfo("startblock=%d nblocks=%d\n", startblock, nblocks); + DEBUGASSERT(priv != NULL && cbparm != NULL && cbstruct != NULL && + nblocks > 1); + + /* Check if the card is locked or write protected (either via software or + * via the mechanical write protect on the card) + */ + + if (mmcsd_wrprotected(priv)) + { + ferr("ERROR: Card is locked or write protected\n"); + return -EPERM; + } + + /* Request the first buffer. */ + + buffer = (*(cbstruct->callback))(curblock); + if (buffer == NULL) + { + return 0; /* Write ended with no blocks written */ + } + +#if defined(CONFIG_SDIO_DMA) && defined(CONFIG_SDIO_PREFLIGHT) + /* If we think we are going to perform a DMA transfer, make sure that we + * will be able to before we commit the card to the operation. + */ + + if ((priv->caps & SDIO_CAPS_DMASUPPORTED) != 0) + { + ret = SDIO_DMAPREFLIGHT(priv->dev, buffer, priv->blocksize); + + if (ret != OK) + { + return ret; + } + } +#endif + + /* Verify that the card is ready for the transfer. The card may still be + * busy from the preceding write transfer. It would be simpler to check + * for write busy at the end of each write, rather than at the beginning of + * each read AND write, but putting the busy-wait at the beginning of the + * transfer allows for more overlap and, hopefully, better performance + */ + + ret = mmcsd_transferready(priv); + if (ret != OK) + { + ferr("ERROR: Card not ready: %d\n", ret); + return ret; + } + + /* If this is a byte addressed SD card, then convert both the total transfer + * size to bytes and the sector start sector number to a byte offset + */ + + nbytes = 1 << priv->blockshift; + if (IS_BLOCK(priv->type)) + { + offset = startblock; + } + else + { + offset = startblock << priv->blockshift; + } + + finfo("nbytes=%d byte offset=%d\n", nbytes, offset); + + /* Select the block size for the card */ + + ret = mmcsd_setblocklen(priv, priv->blocksize); + if (ret != OK) + { + ferr("ERROR: mmcsd_setblocklen failed: %d\n", ret); + return ret; + } + + /* If this is an SD card, then send ACMD23 (SET_WR_BLK_ERASE_COUNT) just + * before sending CMD25 (WRITE_MULTIPLE_BLOCK). This sets the number of + * write blocks to be pre-erased and might make the following multiple block + * write command faster. + */ + + if (IS_SD(priv->type)) + { + + /* Send CMD55, APP_CMD, a verify that good R1 status is returned */ + + mmcsd_sendcmdpoll(priv, SD_CMD55, (uint32_t)priv->rca << 16); + ret = mmcsd_recvR1(priv, SD_CMD55); + if (ret != OK) + { + ferr("ERROR: mmcsd_recvR1 for CMD55 (ACMD23) failed: %d\n", ret); + return ret; + } + + /* Send CMD23, SET_WR_BLK_ERASE_COUNT, and verify that good R1 status is returned */ + + mmcsd_sendcmdpoll(priv, SD_ACMD23, nblocks); + ret = mmcsd_recvR1(priv, SD_ACMD23); + if (ret != OK) + { + ferr("ERROR: mmcsd_recvR1 for ACMD23 failed: %d\n", ret); + return ret; + } + } + + /* Send CMD25, WRITE_MULTIPLE_BLOCK, and verify that good R1 status + * is returned + */ + + mmcsd_sendcmdpoll(priv, MMCSD_CMD25, offset); + ret = mmcsd_recvR1(priv, MMCSD_CMD25); + if (ret != OK) + { + ferr("ERROR: mmcsd_recvR1 for CMD25 failed: %d\n", ret); + return ret; + } + + /* Loop sending a sector at a time. */ + + do + { + /* Configure SDIO controller hardware for the write transfer */ + + SDIO_BLOCKSETUP(priv->dev, priv->blocksize, 1); + SDIO_WAITENABLE(priv->dev, + SDIOWAIT_TRANSFERDONE | SDIOWAIT_TIMEOUT | SDIOWAIT_ERROR); + +#ifdef CONFIG_SDIO_DMA + if ((priv->caps & SDIO_CAPS_DMASUPPORTED) != 0) + { + ret = SDIO_DMASENDSETUP(priv->dev, buffer, nbytes); + if (ret != OK) + { + finfo("SDIO_DMASENDSETUP: error %d\n", ret); + return ret; + } + } + else +#endif + { + SDIO_SENDSETUP(priv->dev, buffer, nbytes); + } + + /* Flag that a write transfer is pending that we will have to check for + * write complete at the beginning of the next transfer. + */ + + priv->wrbusy = true; + + /* Wait for the transfer to complete */ + + evret = mmcsd_eventwait(priv, SDIOWAIT_TIMEOUT | SDIOWAIT_ERROR, + nblocks * MMCSD_BLOCK_WDATADELAY); + if (evret != OK) + { + ferr("ERROR: CMD25 transfer failed: %d\n", evret); + + /* If we return from here, we probably leave the sd-card in + * Receive-data State. Instead, we will remember that + * an error occurred and try to execute the STOP_TRANSMISSION + * to put the sd-card back into Transfer State. + */ + } + + /* Request the next buffer. */ + + if (++curblock < nblocks) + { + buffer = (*(cbstruct->callback))(curblock); + if (buffer == NULL) + { + break; /* Write ended with some blocks written */ + } + } + } + while (curblock < nblocks && evret == OK); + + /* Send STOP_TRANSMISSION */ + + ret = mmcsd_stoptransmission(priv); + if (evret != OK) + { + return evret; + } + + if (ret != OK) + { + ferr("ERROR: mmcsd_stoptransmission failed: %d\n", ret); + return ret; + } + + /* On success, return the number of blocks written */ + + return curblock; +} +#endif + +#else /* PERMIT_SECTOR_BUFFER_CALLBACK is not defined */ + +/**************************************************************************** + * Name: mmcsd_writemultiple + * + * Description: + * Write multiple, contiguous blocks of data to the physical device. + * This function expects that the data to be written is contained in + * one large buffer that is pointed to by buffer. * ****************************************************************************/ @@ -1797,6 +2045,7 @@ static ssize_t mmcsd_writemultiple(FAR struct mmcsd_state_s *priv, off_t offset; size_t nbytes; int ret; + int evret = OK; finfo("startblock=%d nblocks=%d\n", startblock, nblocks); DEBUGASSERT(priv != NULL && buffer != NULL && nblocks > 1); @@ -1866,10 +2115,10 @@ static ssize_t mmcsd_writemultiple(FAR struct mmcsd_state_s *priv, return ret; } - /* If this is an SD card, then send ACMD23 (SET_WR_BLK_COUNT) just before - * sending CMD25 (WRITE_MULTIPLE_BLOCK). This sets the number of write - * blocks to be pre-erased and might make the following multiple block write - * command faster. + /* If this is an SD card, then send ACMD23 (SET_WR_BLK_ERASE_COUNT) just + * before sending CMD25 (WRITE_MULTIPLE_BLOCK). This sets the number of + * write blocks to be pre-erased and might make the following multiple block + * write command faster. */ if (IS_SD(priv->type)) @@ -1884,9 +2133,11 @@ static ssize_t mmcsd_writemultiple(FAR struct mmcsd_state_s *priv, return ret; } - /* Send CMD23, SET_WR_BLK_COUNT, and verify that good R1 status is returned */ + /* Send CMD23, SET_WR_BLK_ERASE_COUNT, and verify that good R1 status + * is returned. + */ - mmcsd_sendcmdpoll(priv, SD_ACMD23, 0); + mmcsd_sendcmdpoll(priv, SD_ACMD23, nblocks); ret = mmcsd_recvR1(priv, SD_ACMD23); if (ret != OK) { @@ -1961,17 +2212,27 @@ static ssize_t mmcsd_writemultiple(FAR struct mmcsd_state_s *priv, /* Wait for the transfer to complete */ - ret = mmcsd_eventwait(priv, SDIOWAIT_TIMEOUT | SDIOWAIT_ERROR, - nblocks * MMCSD_BLOCK_WDATADELAY); - if (ret != OK) + evret = mmcsd_eventwait(priv, SDIOWAIT_TIMEOUT | SDIOWAIT_ERROR, + nblocks * MMCSD_BLOCK_WDATADELAY); + if (evret != OK) { - ferr("ERROR: CMD25 transfer failed: %d\n", ret); - return ret; + ferr("ERROR: CMD25 transfer failed: %d\n", evret); + + /* If we return from here, we probably leave the sd-card in + * Receive-data State. Instead, we will remember that + * an error occurred and try to execute the STOP_TRANSMISSION + * to put the sd-card back into Transfer State. + */ } /* Send STOP_TRANSMISSION */ ret = mmcsd_stoptransmission(priv); + if (evret != OK) + { + return evret; + } + if (ret != OK) { ferr("ERROR: mmcsd_stoptransmission failed: %d\n", ret); @@ -1983,6 +2244,7 @@ static ssize_t mmcsd_writemultiple(FAR struct mmcsd_state_s *priv, return nblocks; } #endif +#endif /**************************************************************************** * Name: mmcsd_flush @@ -3397,3 +3659,4 @@ errout_with_alloc: } #endif /* defined (CONFIG_MMCSD) && defined (CONFIG_MMCSD_SDIO) */ + diff --git a/drivers/mmcsd/mmcsd_sdio.h b/drivers/mmcsd/mmcsd_sdio.h index d7a3f1d0c9..f0c2f3eca0 100644 --- a/drivers/mmcsd/mmcsd_sdio.h +++ b/drivers/mmcsd/mmcsd_sdio.h @@ -1,7 +1,7 @@ /******************************************************************************************** * drivers/mmcsd/mmcsd_sdio.h * - * Copyright (C) 2009, 2011 Gregory Nutt. All rights reserved. + * Copyright (C) 2009, 2011, 2018 Gregory Nutt. All rights reserved. * Author: Gregory Nutt * * Redistribution and use in source and binary forms, with or without @@ -312,11 +312,39 @@ struct mmcsd_scr_s uint8_t sdversion; /* 59:56 SD memory card physical layer version */ uint8_t erasestate; /* 55:55 Data state after erase (1 or 0) */ uint8_t security; /* 54:52 SD security support */ - uint8_t buswidth; /* 51:48 DAT bus widthes supported */ + uint8_t buswidth; /* 51:48 DAT bus widths supported */ /* 47:32 SD reserved space */ uint32_t mfgdata; /* 31:0 Reserved for manufacturing data */ }; +#if defined(CONFIG_SDIO_BLK_CALLBACK) && !defined(CONFIG_DRVR_WRITEBUFFER) && \ + !defined(CONFIG_MMCSD_MULTIBLOCK_DISABLE) +# define PERMIT_SECTOR_BUFFER_CALLBACK +#endif + +#ifdef PERMIT_SECTOR_BUFFER_CALLBACK + +/* When CONFIG_SDIO_BLK_CALLBACK is defined and a call is made to the + * mmcsd_write function specifying greater than one sector to be written, the + * mmcsd_write function will interpret its buffer parameter as a pointer to + * the below structure. + * + * This structure contains the address of a user callback function that will + * return a pointer to the memory containing the data for next sector to be + * written. + * + * The address of this structure and all addresses returned by the callback + * function must be aligned and positioned according to the DMA transfers + * rules of the architecture being used. + */ + +typedef FAR uint8_t *(*sdsector_callback_t)(size_t); +struct sdsector_callback_s +{ + sdsector_callback_t callback; +}; +#endif + /******************************************************************************************** * Public Data ********************************************************************************************/ @@ -334,7 +362,6 @@ extern "C" * Public Functions ********************************************************************************************/ - #undef EXTERN #if defined(__cplusplus) }